Hadoop Based Link Prediction Performance Analysis

نویسندگان

Yuxiao Dong

Casey Robinson

Jian Xu

چکیده

Link prediction is an important problem in social network analysis and has been applied in a variety of fields. Link prediction aims to estimate the likelihood of the existence of links between nodes by the known network structure. The time complexity of link prediction algorithms in huge-scale networks remains unexplored and unsolved, especially for sparse networks. In this project, we will explore how parallel computing speeds up link prediction in huge-scale networks. We implemented similarity based link prediction algorithms based on MapReduce, which have the time complexity of O(n) in sparse networks. We analyzed the performance of our algorithms on the Data Intensive Science Cluster at University of Notre Dame. Weevaluate the performance with different configurations, monitor the resource utilization of the distributed computation, and optimize accordingly. After analyzing the efficiency with different configurations, we present the fastest approach of performing parallelized link prediction, which is particularly suited for real-world big data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Link Prediction using Network Embedding based on Global Similarity

Background: The link prediction issue is one of the most widely used problems in complex network analysis. Link prediction requires knowing the background of previous link connections and combining them with available information. The link prediction local approaches with node structure objectives are fast in case of speed but are not accurate enough. On the other hand, the global link predicti...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

Less is More: Temporal Fault Predictive Performance over Multiple Hadoop Releases

We investigate search based fault prediction over time based on 8 consecutive Hadoop versions, aiming to analyse the impact of chronology on fault prediction performance. Our results confound the assumption, implicit in previous work, that additional information from historical versions improves prediction; though G-mean tends to improve, Recall can be reduced.

متن کامل

A Link Prediction Method Based on Learning Automata in Social Networks

Nowadays, online social networks are considered as one of the most important emerging phenomena of human societies. In these networks, prediction of link by relying on the knowledge existing of the interaction between network actors provides an estimation of the probability of creation of a new relationship in future. A wide range of applications can be found for link prediction such as electro...

متن کامل

Scalable Link Prediction in Online Social Networks

We describe a link prediction method based on a scalable community detection algorithm. It can be used to recommend new links in a real world social network with millions of users. Using a Hadoop cluster, we test our implementation on a Twitter user network containing 40 million users and 1.4 billion connections. We show that communities detected can then be used to recommend new users to follo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Hadoop Based Link Prediction Performance Analysis

نویسندگان

چکیده

منابع مشابه

Link Prediction using Network Embedding based on Global Similarity

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Less is More: Temporal Fault Predictive Performance over Multiple Hadoop Releases

A Link Prediction Method Based on Learning Automata in Social Networks

Scalable Link Prediction in Online Social Networks

عنوان ژورنال:

اشتراک گذاری